-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk Data Model supports per-chunk service event mapping #6744
Chunk Data Model supports per-chunk service event mapping #6744
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/efm-recovery #6744 +/- ##
=====================================================
Coverage 41.71% 41.72%
=====================================================
Files 2030 2031 +1
Lines 180459 180552 +93
=====================================================
+ Hits 75285 75329 +44
- Misses 98978 99031 +53
+ Partials 6196 6192 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
strategic / conceptual thoughts
I mean is this is only a temporary solution, I am happy and we can skip most of question 3. |
Summary of Discussion with @AlexHentschelChange of
|
Summary of discussion with @zhangchiqingLet's call v0 the old version and v1 the new version. Leo pointed out the v0 nodes will fail to validate data models produced by v1 nodes. In particular, this step:
will not work, if v1 nodes immediately begin produce v1 chunks with non-nil This means the upgrade needs to be split into multiple steps, because v0 software will be unable to read v1 data models with non-nil new fields (will produce different hashes). Instead, we need:
Because of the above additional complexity, I think we should revert the |
I think we can still do with just one HCU, but we might need 2 rolling upgrades. This is the steps, basically Step 1 and Step 3 are rolling upgrades, Step 2 is an protocol HCU:
Once an protocol HCU event is emitted, then:
|
@zhangchiqing I agree that would work. Though, I suspect we will prefer fewer software upgrades, so we depend less on actions by node operators, but it's good to know that we have flexibility. I have written a version of the upgrade plan based on our last discussion (with 2 HCUs and 1 rolling upgrade) here. The upgrade process is fairly simple - there is also an enumeration of versions, which I hope can be generalized beyond this specific example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I struggled with understanding the Chunk Verifier Tests. I think we can fix this with some documentation. They are tests, its fine if the documentation is repetitive.
// (2) Otherwise, ServiceEventCount must be non-nil. | ||
// Within an ExecutionResult, all chunks must use either representation (1) or (2), not both. | ||
ServiceEventCount *uint16 | ||
BlockID Identifier // Block id of the execution result this chunk belongs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we maybe want to move this to the beginning of the Chunk Body? I think conceptually, that would be more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but RLP encoding depends on field ordering within structs, so doing this would change the ID computation (unless we over-rode again, using the RLP encoding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test case to validate this a61e30e
module/chunks/chunkVerifier_test.go
Outdated
// Tests the case where a service event is emitted outside the system chunk | ||
// and the event computed by the VN does not match the Result. | ||
// NOTE: this test case relies on the ordering of transactions in generateCollection. | ||
func (s *ChunkVerifierTestSuite) TestServiceEventsMismatch_NonSystemChunk() { | ||
script := "service event mismatch in non-system chunk" | ||
meta := s.GetTestSetup(s.T(), script, false, true) | ||
vch := meta.RefreshChunkData(s.T()) | ||
|
||
// modify the list of service events produced by FVM | ||
// EpochSetup event is expected, but we emit EpochCommit here resulting in a chunk fault | ||
epochCommitServiceEvent, err := convert.ServiceEvent(testChain, epochCommitEvent) | ||
require.NoError(s.T(), err) | ||
|
||
s.snapshots[script] = &snapshot.ExecutionSnapshot{} | ||
s.outputs[script] = fvm.ProcedureOutput{ | ||
ComputationUsed: computationUsed, | ||
ConvertedServiceEvents: flow.ServiceEventList{*epochCommitServiceEvent}, | ||
Events: meta.ChunkEvents[:3], | ||
} | ||
|
||
_, err = s.verifier.Verify(vch) | ||
|
||
assert.Error(s.T(), err) | ||
assert.True(s.T(), chunksmodels.IsChunkFaultError(err)) | ||
assert.IsType(s.T(), &chunksmodels.CFInvalidServiceEventsEmitted{}, err) | ||
} | ||
|
||
// Tests that service events are checked, when they appear outside the system chunk. | ||
// NOTE: this test case relies on the ordering of transactions in generateCollection. | ||
func (s *ChunkVerifierTestSuite) TestServiceEventsAreChecked_NonSystemChunk() { | ||
script := "service event in non-system chunk" | ||
meta := s.GetTestSetup(s.T(), script, false, true) | ||
vch := meta.RefreshChunkData(s.T()) | ||
|
||
// setup the verifier output to include the correct data for the service events | ||
output := generateDefaultOutput() | ||
output.ConvertedServiceEvents = meta.ServiceEvents | ||
output.Events = meta.ChunkEvents[:3] // 2 default events + 1 service event | ||
s.outputs[script] = output | ||
|
||
spockSecret, err := s.verifier.Verify(vch) | ||
assert.NoError(s.T(), err) | ||
assert.NotNil(s.T(), spockSecret) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling convincing myself that we really test the correct edge case. In my mind, we are trying to test the following complementary aspects, with as little gaps as possible::
-
Situation: a non-service chunk containing a service event (honest). Expected: pass. I think this is tested in
TestServiceEventsAreChecked_NonSystemChunk
-
Exactly the same as situation 1, but that the
ConvertedServiceEvents
is different.I got confused here, because in test
TestServiceEventsMismatch_NonSystemChunk
too many lines of code are different, which each could be symptom a chunk fault. For me, it would really help ifTestServiceEventsMismatch_NonSystemChunk
mirroredTestServiceEventsAreChecked_NonSystemChunk
with as little changes as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it would really help if TestServiceEventsMismatch_NonSystemChunk mirrored TestServiceEventsAreChecked_NonSystemChunk with as little changes as possible.
They are different because the existing testing infrastructure has very different code-paths for the system chunk and other chunks. Unfortunately I don't think it is feasible to make them more similar without a larger refactor of this test file.
module/chunks/chunkVerifier_test.go
Outdated
s.snapshots[script] = &snapshot.ExecutionSnapshot{} | ||
s.outputs[script] = fvm.ProcedureOutput{ | ||
ComputationUsed: computationUsed, | ||
ConvertedServiceEvents: flow.ServiceEventList{*epochCommitServiceEvent}, | ||
Events: meta.ChunkEvents[:3], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this seems significantly different from Specifically, I don
flow-go/module/chunks/chunkVerifier_test.go
Lines 326 to 330 in 176100f
// setup the verifier output to include the correct data for the service events | |
output := generateDefaultOutput() | |
output.ConvertedServiceEvents = meta.ServiceEvents | |
output.Events = meta.ChunkEvents[:3] // 2 default events + 1 service event | |
s.outputs[script] = output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your comment got cut off. The portion of code you linked is constructing an expected output for a transaction in which a service event was emitted (outside the system chunk).
- Line 328 is assigning the default service events for the output for a non-system-chunk transaction, as the expected output
- Line 329 is pulling out the 3 events associated with the transaction, and adding it to the expected output
- Line 330 is inserting the expected output into the map, so the verifier will consider this the canonical output for the transaction
// setup the verifier output to include the correct data for the service events | ||
output := generateDefaultOutput() | ||
output.ConvertedServiceEvents = meta.ServiceEvents | ||
output.Events = meta.ChunkEvents[:3] // 2 default events + 1 service event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we trimming the meta.ChunkEvents
here? The chunk events can be more, can't they?
The testing framework has lots of layers, which I am struggling with. Though, to the best of my limited understanding, honest chunk data should be consistent with the verifier's local output
. The chunk data here is represented by meta
and the derived vch
. So to me, the following assignment would make sense:
output.Events = meta.ChunkEvents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is the expected output for one transaction. The test framework adds 2 events (contents of eventsList
) as the expected output for every transaction by default. If specified (new option in this PR), it will additionally add 1 service event (3 total).
Co-authored-by: Alexander Hentschel <[email protected]>
…ow-go into jord/6622-chunk-service-events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests. LGTM
Co-authored-by: Leo Zhang <[email protected]>
This PR adds support for specifying which service events were emitted in which chunk, by modifying the
ChunkBody
data structure in a backward compatible manner. Addresses #6622.Changes
ServiceEventCount
field toChunkBody
:ChunkBody
will unmarshal into a struct with a nilServiceEventCount
. We define a chunk with nilServiceEventCount
to have the same semantics as before the field existed: if any service events were emitted, then they were emitted from the system chunk.ChunkBody
will always have non-nilServiceEventCount
Upgrade Notes
Source of truth for upgrade plans (still WIP): https://flowfoundation.notion.site/EFM-Recovery-Release-Upgrade-Plan-WIP-14d1aee1232480228a87e43933815285?pvs=4
Note: Implementation changes associated with the upgrade process will be implemented separately, when the upgrade process is fully specified (see #6777).
#6783 captures changes required around the upgrade behaviour.
To Do Before Merging
ExecutionResult
versionsChunkBody
backward-compatibility #6773ServiceEventCount
field)This PR replaces two prior approaches, implemented in part #6629 and #6730.